1 - Aims

Twitter data has recently been one of the most favorite dataset for Natural Language Processing (NLP) researchers. In this assignment analysed tweets that were collected during the airing of the 'Red Wedding' episode of Game of Thrones uing Natural Language Processing.

2 - Background

Natural language processing (NLP) is a branch of artificial intelligence that helps computers understand, interpret and manipulate human language. Basic NLP tasks include tokenization and parsing, lemmatization/stemming, part-of-speech tagging, language detection and identification of semantic relationships.

3 - Setup Environment

In [754]:
import numpy as np
import pandas as pd 
import re, string, unicodedata
import nltk
from nltk import word_tokenize, FreqDist
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
nltk.download
nltk.download('wordnet')
nltk.download('stopwords')
from nltk.tokenize import TweetTokenizer
pd.set_option('display.max_columns', None)
data = pd.read_csv('got_tweets.csv')
data.head(5)
!pip install emoji
!pip install num2words
from num2words import num2words
from nltk.tokenize import word_tokenize
!pip install --upgrade gensim
from gensim import corpora, models
from nltk.stem import PorterStemmer
from nltk.stem import LancasterStemmer
!pip install wordcloud
from wordcloud import WordCloud
from os import path
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
[nltk_data] Downloading package wordnet to /home/fmx2hx/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package stopwords to /home/fmx2hx/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: emoji in /home/fmx2hx/.local/lib/python3.7/site-packages (1.2.0)
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: num2words in /home/fmx2hx/.local/lib/python3.7/site-packages (0.5.10)
Requirement already satisfied: docopt>=0.6.2 in /home/fmx2hx/.local/lib/python3.7/site-packages (from num2words) (0.6.2)
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: gensim in /home/fmx2hx/.local/lib/python3.7/site-packages (4.0.1)
Requirement already satisfied: scipy>=0.18.1 in /opt/conda/lib/python3.7/site-packages (from gensim) (1.6.2)
Requirement already satisfied: smart-open>=1.8.1 in /home/fmx2hx/.local/lib/python3.7/site-packages (from gensim) (5.0.0)
Requirement already satisfied: numpy>=1.11.3 in /home/fmx2hx/.local/lib/python3.7/site-packages (from gensim) (1.20.3)
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: wordcloud in /home/fmx2hx/.local/lib/python3.7/site-packages (1.8.1)
Requirement already satisfied: matplotlib in /home/fmx2hx/.local/lib/python3.7/site-packages (from wordcloud) (3.3.4)
Requirement already satisfied: numpy>=1.6.1 in /home/fmx2hx/.local/lib/python3.7/site-packages (from wordcloud) (1.20.3)
Requirement already satisfied: pillow in /home/fmx2hx/.local/lib/python3.7/site-packages (from wordcloud) (8.2.0)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib->wordcloud) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /opt/conda/lib/python3.7/site-packages (from matplotlib->wordcloud) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->wordcloud) (1.3.1)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->wordcloud) (2.8.1)
Requirement already satisfied: six in /home/fmx2hx/.local/lib/python3.7/site-packages (from cycler>=0.10->matplotlib->wordcloud) (1.15.0)
In [755]:
data
Out[755]:
id created_at created_at_shift from_user from_user_id from_user_id_str from_user_name id_str in_reply_to_status_id in_reply_to_status_id_str iso_language_code latitude longitude metadata place profile_image_url profile_image_url_https query source text to_user to_user_id to_user_id_str to_user_name type
0 3.416120e+17 6/3/2013 18:45 0 TheMadamEditor 337689639 337689639 madam-editor 3.416120e+17 NaN NaN en NaN NaN result_type=recent NaN http://a0.twimg.com/profile_images/1448601184/... https://si0.twimg.com/profile_images/144860118... #gameofthrones <a href="http://twitter.com/download/iphone">T... About to watch #GameOfThrones and I am tweaked. NaN NaN NaN NaN NaN
1 3.416120e+17 6/3/2013 18:45 0 nitaselimi 421347539 421347539 Nita Selimi 3.416120e+17 3.416110e+17 3.416110e+17 tl NaN NaN result_type=recent NaN http://a0.twimg.com/profile_images/3570330661/... https://si0.twimg.com/profile_images/357033066... #gameofthrones <a href="http://twitter.com/download/android">... @Grangjii gjith e kom dasht, veq ti je tu ma s... Grangjii 45957016.0 45957016.0 Granit Gjevukaj NaN
2 3.416120e+17 6/3/2013 18:45 0 dh_editorial 256671039 256671039 Dee @ EditorialEyes 3.416120e+17 NaN NaN en NaN NaN result_type=recent NaN http://a0.twimg.com/profile_images/1252656506/... https://si0.twimg.com/profile_images/125265650... #gameofthrones <a href="http://twitter.com/">web</a> Are there, like, House Stark/Tony Stark mashup... NaN NaN NaN NaN NaN
3 3.416120e+17 6/3/2013 18:45 0 theprint 809334 809334 Rasmus Rasmussen 3.416120e+17 NaN NaN en NaN NaN result_type=recent NaN http://a0.twimg.com/profile_images/1469678734/... https://si0.twimg.com/profile_images/146967873... #gameofthrones <a href="http://www.tweetdeck.com">TweetDeck</a> Reading #GameOfThrones reactions after last ni... NaN NaN NaN NaN NaN
4 3.416120e+17 6/3/2013 18:45 0 Mr_Twenty2 69222052 69222052 Marty Caan 3.416120e+17 NaN NaN en NaN NaN result_type=recent NaN http://a0.twimg.com/profile_images/1492961144/... https://si0.twimg.com/profile_images/149296114... #gameofthrones <a href="http://twitter.com/">web</a> I don't know if I'm impressed or disgusted! Br... NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
27330 3.416680e+17 6/3/2013 22:28 0 RocketQueen2x5 155242080 155242080 Funke Aleshe 3.416680e+17 NaN NaN en NaN NaN result_type=recent NaN http://a0.twimg.com/profile_images/3653474295/... https://si0.twimg.com/profile_images/365347429... game+of+thrones <a href="http://twitter.com/download/android">... 30mins after the episode has finished, Game o... NaN NaN NaN NaN NaN
27331 3.416680e+17 6/3/2013 22:28 0 Brianpmohan 51492815 51492815 Brian Mohan 3.416680e+17 NaN NaN en NaN NaN result_type=recent NaN http://a0.twimg.com/profile_images/3442035606/... https://si0.twimg.com/profile_images/344203560... game+of+thrones <a href="http://twitter.com/download/iphone">T... Actually fuming with game of thrones ha how sa... NaN NaN NaN NaN NaN
27332 3.416680e+17 6/3/2013 22:28 0 WestJamUnited 334073484 334073484 Jamie 3.416680e+17 NaN NaN en NaN NaN result_type=recent NaN http://a0.twimg.com/profile_images/3741402554/... https://si0.twimg.com/profile_images/374140255... game+of+thrones <a href="http://twitter.com/download/iphone">T... RT @WstonesOxfordSt: It's entirely possible th... NaN NaN NaN NaN NaN
27333 3.416680e+17 6/3/2013 22:28 0 PAHarper 106702255 106702255 Phil Harper 3.416680e+17 NaN NaN en NaN NaN result_type=recent NaN http://a0.twimg.com/profile_images/1267004068/... https://si0.twimg.com/profile_images/126700406... game+of+thrones <a href="http://twitter.com/download/iphone">T... RT @WstonesOxfordSt: It's entirely possible th... NaN NaN NaN NaN NaN
27334 3.416680e+17 6/3/2013 22:28 0 ShrimpWonder 365207282 365207282 boundarymembranes 3.416680e+17 NaN NaN en NaN NaN result_type=recent NaN http://a0.twimg.com/profile_images/3672468820/... https://si0.twimg.com/profile_images/367246882... game+of+thrones <a href="http://twitter.com/">web</a> RT @WstonesOxfordSt: It's entirely possible th... NaN NaN NaN NaN NaN

27335 rows × 25 columns

4 - Most Common Hastags

Parsed all the hashtags out of the texts, then counted their overall occurrences and found 10 most common hashtags. Made a bar chart of these top 10 hashtags.

Firstly using Regex (Regular Expressions) selected hastags and assigned to data[hashtag] column Calculated the length of the list / number of entries! (! This is with duplicate entries) Transformed the list of hastags into a dataframe Renamed the columns of the dataframe, initialized punctuations string. Calculated the number of time a hashtag appear in the tweets. Ordered the dataframe with the highest number of words at the top. Droped the duplicates of the words.Showed only the top ten first word in bar chart.

In [756]:
# Firstly using Regex (Regular Expressions) selected hastags and assigned to data[hashtag] column
data['hashtag'] = data['text'].apply(lambda x: re.findall(r"(#\S+)", x.lower()))
data.head(5)
hashtags = []

for sublist in data.hashtag:
    for word in sublist:
        hashtags.append(word)
In [757]:
#hashtags
In [758]:
# Calculate the length of the list / number of entries! (! This is with duplicate entries)
nb_hashtags = len(hashtags)
print("Total number of hashtags is " + str(nb_hashtags) + ".")
Total number of hashtags is 27724.
In [759]:
# Transform the list of hastags into a dataframe
hashtags_df = pd.DataFrame(hashtags)
# Rename the columns of the dataframe
hashtags_df.columns = ['hashtags']
In [760]:
## initializing punctuations string 
hashtags_df['hashtags'] = hashtags_df.hashtags.str.replace('[^\w\s]','')
/home/fmx2hx/.conda/envs/conda/lib/python3.7/site-packages/ipykernel/__main__.py:2: FutureWarning: The default value of regex will change from True to False in a future version.
  from ipykernel import kernelapp as app
In [761]:
# Calculate the number of time a hashtag appear in the tweets
hashtags_df['count'] = hashtags_df.groupby('hashtags')['hashtags'].transform('count')
In [762]:
hashtags_df['count']
Out[762]:
0        15299
1        15299
2        15299
3            2
4        15299
         ...  
27719        1
27720        1
27721       10
27722     2185
27723      143
Name: count, Length: 27724, dtype: int64
In [763]:
# Order the dataframe with the highest number of words at the top. 
hashtags_df = hashtags_df.sort_values(by=['count'], ascending=False)
# Drop the duplicates of the words! Show only the top ten first word!
hashtags_df = hashtags_df.drop_duplicates(['hashtags'], keep='first')
In [764]:
hashtags_df[:10]
Out[764]:
hashtags count
0 gameofthrones 15299
6516 redwedding 4245
20831 got 2185
17840 therainsofcastamere 320
18362 rainsofcastamere 166
2376 getglue 143
23869 wtf 104
764 theredwedding 74
11663 hbo 70
8933 robbstark 66
In [765]:
ax=hashtags_df[:10].plot.bar(x='hashtags', y='count', rot=90)
ax.set_ylabel('count of hashtags')
Out[765]:
Text(0, 0.5, 'count of hashtags')
In [766]:
noduplicate_hashtags = len(hashtags_df)
print("Total number of single hashtags is " + str(noduplicate_hashtags) + ".")
Total number of single hashtags is 2569.

5 - Tokenize the text of the tweets

Tokenize the text of the tweets, and gather the 'real' words for each tweet.

By 'real' words, there should be:

  • no punctuations
  • hashtags only without # mark
  • no user mentions
  • no URLs
  • no emojis
  • no numbers

Count word occurrences, make a histogram of the occurrences. What are the top words? Are they what you expected?

What crazy words did you get? Explain possible approaches, with which you could throw out this kind of junk text as well.

Firstly removed emojis. Then removed url and mentions using preprocessor library. removed punctiation. tokenized words and converted lowercase removed stopwords and numeric characters. Took all the tokens (separated words) and append them in a list I have a giant list of all words that have been writen in the tweets Calculated the length of the list / number of entries! Turned the list into a dataframe Calculated the number of time a word appear in the tweets Order the dataframe with the highest number of words at the top. Add the frequency of occurance of the words! Drop the duplicates of the words! Show only the top ten first word! Top ten words are 'game', 'thrones', 'episode', 'last', 'im', 'de', 'fuck', 'wedding', 'watching', 'still' I think that they are very related to game of thrones as I expected.

In [767]:
#!pip install emoji
#Encodes all the data into ASCII values #and ignore if the data can not be encoded. 
#After encoding it tries to decode them all again
#because all the emojis were ignored in the encoding process. 
#So now we have all the data without emojis.
#import emoji
data = data.astype(str).apply(lambda x: x.str.encode('ascii', 'ignore').str.decode('ascii'))
STOP_WORDS = stopwords.words('english')
In [768]:
import preprocessor as p #Adding the cleaned (After removal of URLs, Mentions) tweets to a new column as a new feature ‘text

def preprocess_tweet(row):
    text = row['text']
    text = p.clean(text)
    return text
data['textclean'] = data.apply(preprocess_tweet, axis=1)

data['textclean']
#After removal of URLs, Mentions
Out[768]:
0                         About to watch and I am tweaked.
1        gjith e kom dasht, veq ti je tu ma shti nqef e...
2        Are there, like, House Stark/Tony Stark mashup...
3        Reading reactions after last night's episode i...
4        I don't know if I'm impressed or disgusted! Br...
                               ...                        
27330    mins after the episode has finished, Game of T...
27331    Actually fuming with game of thrones ha how sa...
27332    : It's entirely possible that Twitter just wat...
27333    : It's entirely possible that Twitter just wat...
27334    : It's entirely possible that Twitter just wat...
Name: textclean, Length: 27335, dtype: object
In [769]:
def remove_punct(text): #remove punctiation
    text_nopunct = ''.join([char for char in text if char not in string.punctuation])
    return text_nopunct
data['textcleanpunct'] = data['textclean'].apply(lambda x: remove_punct(x))
# this is how you can drop any column. --> train_data.drop('text_clean', axis=1, inplace=True)
data['textcleanpunct']
Out[769]:
0                          About to watch and I am tweaked
1        gjith e kom dasht veq ti je tu ma shti nqef ed...
2        Are there like House StarkTony Stark mashups o...
3        Reading reactions after last nights episode is...
4        I dont know if Im impressed or disgusted Brill...
                               ...                        
27330    mins after the episode has finished Game of Th...
27331    Actually fuming with game of thrones ha how sa...
27332     Its entirely possible that Twitter just watch...
27333     Its entirely possible that Twitter just watch...
27334     Its entirely possible that Twitter just watch...
Name: textcleanpunct, Length: 27335, dtype: object
In [770]:
#function to tokenize words 
def tokenize(text):
    tokens = re.split('\W+',text) #W+ means that either a word character (A-Z) or a dash(-) can go there.
    return tokens

#converting to lowercase as python is case-sensitive
data['textlower'] = data['textcleanpunct'].apply(lambda x: tokenize(x.lower()))
data['textlower']
Out[770]:
0                  [about, to, watch, and, i, am, tweaked]
1        [gjith, e, kom, dasht, veq, ti, je, tu, ma, sh...
2        [are, there, like, house, starktony, stark, ma...
3        [reading, reactions, after, last, nights, epis...
4        [i, dont, know, if, im, impressed, or, disgust...
                               ...                        
27330    [mins, after, the, episode, has, finished, gam...
27331    [actually, fuming, with, game, of, thrones, ha...
27332    [, its, entirely, possible, that, twitter, jus...
27333    [, its, entirely, possible, that, twitter, jus...
27334    [, its, entirely, possible, that, twitter, jus...
Name: textlower, Length: 27335, dtype: object
In [771]:
stopword = nltk.corpus.stopwords.words('english') #all english stopwords 
#function to remove stopwords and numeric characters and return length of word >1
def remove_stopwords(tokenized_list):
    text = [word for word in tokenized_list if word not in stopword and word.isalpha() and len(word)>1]
    return text

data['removewtopwords'] = data['textlower'].apply(lambda x: remove_stopwords(x))
data['removewtopwords']
Out[771]:
0                                         [watch, tweaked]
1        [gjith, kom, dasht, veq, ti, je, tu, shti, nqe...
2                 [like, house, starktony, stark, mashups]
3        [reading, reactions, last, nights, episode, hi...
4        [dont, know, im, impressed, disgusted, brillia...
                               ...                        
27330    [mins, episode, finished, game, thrones, taken...
27331     [actually, fuming, game, thrones, ha, sad, must]
27332    [entirely, possible, twitter, watched, game, t...
27333    [entirely, possible, twitter, watched, game, t...
27334    [entirely, possible, twitter, watched, game, t...
Name: removewtopwords, Length: 27335, dtype: object
In [772]:
# Take all the tokens (separated words) and append them in a list
# I have a giant list of all words that have been writen in the tweets

tokens = []

for sublist in data.removewtopwords:
    for word in sublist:
        tokens.append(word)
In [773]:
# Calculate the length of the list / number of entries!
nb_tokens = len(tokens)
print("The total number of words for all the tweets is " + str(nb_tokens) + ".")
The total number of words for all the tweets is 180459.
In [774]:
# Turn the list into a dataframe -> easier to deal with!
tokens_df = pd.DataFrame(tokens)
tokens_df.columns = ['words']
In [775]:
# Calculate the number of time a word appear in the tweets 
tokens_df['count'] = tokens_df.groupby('words')['words'].transform('count')
In [776]:
# Order the dataframe with the highest number of words at the top. 
tokens_df = tokens_df.sort_values(by=['count'], ascending=False)
# Drop the duplicates of the words! Show only the top ten first word!
tokens_df = tokens_df.drop_duplicates(['words'], keep='first')
In [777]:
#for ' ' values
tokens_df.columns = tokens_df.columns.str.replace(' ', '')
tokens_df.dropna(inplace=True)
In [778]:
index_names = tokens_df[(tokens_df['words']=='')].index
tokens_df.drop(index_names, inplace = True)
In [779]:
tokens_df[:10].plot.bar(x='words', y='count', rot=90);
In [780]:
# Add the frequency of occurance of the words!
tokens_df['frequency_totwords'] = (tokens_df['count']/nb_tokens)*100
tokens_df[:10]
tokens_df[:10]['words'].tolist()
Out[780]:
['game',
 'thrones',
 'episode',
 'last',
 'im',
 'de',
 'fuck',
 'wedding',
 'watching',
 'still']
In [781]:
tokens_df[:10].plot.hist(x='words', y='count');
In [782]:
#tokens

6- Stopwords in brown corpus and Got tweets

Extract the stopword list for the English language with the help of nltk. Download the standard Brown Corpus also from nltk, count the relative frequency of stopwords in both the Brown Corpus and the GoT tweets. Make a scatterplot of your results, try to explain possible similarities and deviations. What is the correlation in the stopword frequencies of the two datasets?

Firstly I downloaded brown corpus then I found stopwords in brown dataset for words column. Then I added to the list. Calculated the length of the list Calculate the number of time a word appear. Ordered the dataframe with the highest number of words at the top. Droped the duplicates of the words! Showed only the top ten first words. Then I counted frequency of stopwords in brown corpus. And made a scatterplot that includes frequency of stopwords in brown corpus. Then I followed the same steps for got tweets. brown corpus top10 stop words: 'the', 'of', 'and', 'to', 'a', 'in', 'that', 'is', 'was', 'for' got tweets top10 stop words: 'of', 'the', 'i', 'to', 'a', 'that', 'in', 'just', 'was', 'what'

In [783]:
nltk.download('brown')
[nltk_data] Downloading package brown to /home/fmx2hx/nltk_data...
[nltk_data]   Package brown is already up-to-date!
Out[783]:
True
In [784]:
from nltk.corpus import brown
#brown=brown.words()
In [785]:
brown=list(brown.words())
In [786]:
#brown
In [787]:
###brown stop word
In [788]:
stopwordbrown= nltk.corpus.stopwords.words('english') #all english stopwords
browntext = [word for word in brown if word in stopwordbrown]#to return all stopwords
In [789]:
#browntext
In [790]:
# Calculate the length of the list / number of entries!
nb_brown = len(browntext)
print("The total number of brown for all the tweets is " + str(nb_brown) + ".")

# Turn the list into a dataframe
The total number of brown for all the tweets is 433691.
In [791]:
browntext_df = pd.DataFrame(browntext)
browntext_df.columns = ['stops']
In [792]:
# Calculate the number of time a word appear.
browntext_df['count'] = browntext_df.groupby('stops')['stops'].transform('count')
In [793]:
# Order the dataframe with the highest number of words at the top. 
browntext_df = browntext_df.sort_values(by=['count'], ascending=False)
# Drop the duplicates of the words! Show only the top ten first world!
browntext_df = browntext_df.drop_duplicates(['stops'], keep='first')
browntext_df
Out[793]:
stops count
273885 the 62713
299084 of 36080
392304 and 27915
245352 to 25732
272852 a 21881
... ... ...
70296 shan't 1
359824 y 1
58046 don 1
202333 o 1
281298 re 1

155 rows × 2 columns

In [794]:
browntext_df[:10].plot.bar(x='stops', y='count', rot=90);
In [795]:
# Add the frequency of occurance of the words!
browntext_df['frequency_stopwords'] = (browntext_df['count']/nb_tokens)*100
browntext_df[:10]
browntext_df[:10].plot.hist(x='stops', y='count');
In [796]:
brownstop_dftop10=browntext_df[:10]  
In [797]:
brownstop_dftop10['stops'].tolist()
Out[797]:
['the', 'of', 'and', 'to', 'a', 'in', 'that', 'is', 'was', 'for']
In [798]:
import seaborn as sns
import matplotlib.pyplot as plt
In [799]:
sns.scatterplot(data=brownstop_dftop10, x="stops", y="frequency_stopwords")
Out[799]:
<AxesSubplot:xlabel='stops', ylabel='frequency_stopwords'>
In [800]:
#for got dataset
In [801]:
stopwordgot= nltk.corpus.stopwords.words('english') #all english stopwords
def find_stopwords(stop_list):
    text = [word for word in stop_list if word in stopwordgot]#to return all stopwords
    return text
data['stopwordsall'] = data['textlower'].apply(lambda x: find_stopwords(x))
In [802]:
gotstopword=data['stopwordsall'].values.tolist()
In [803]:
#gotstopword
In [804]:
gotstop = []

for sublist in gotstopword:
    for word in sublist:
        gotstop.append(word)
In [805]:
# Calculate the length of the list / number of entries!
nb_stops = len(gotstop)
print("The total number of words for all the tweets is " + str(nb_stops) + ".")

# Turn the list into a dataframe -> easier to deal with!
gotstop_df = pd.DataFrame(gotstop)
gotstop_df.columns = ['gotstopwords']
The total number of words for all the tweets is 117982.
In [806]:
gotstop_df
Out[806]:
gotstopwords
0 about
1 to
2 and
3 i
4 am
... ...
117977 just
117978 of
117979 and
117980 to
117981 itself

117982 rows × 1 columns

In [807]:
# Calculate the number of time a word appear in the tweets 
gotstop_df['count'] = gotstop_df.groupby('gotstopwords')['gotstopwords'].transform('count')
In [808]:
gotstop_df
Out[808]:
gotstopwords count
0 about 1191
1 to 5000
2 and 2610
3 i 7151
4 am 514
... ... ...
117977 just 3122
117978 of 14494
117979 and 2610
117980 to 5000
117981 itself 30

117982 rows × 2 columns

In [809]:
# Order the dataframe with the highest number of words at the top. 
gotstop_df = gotstop_df.sort_values(by=['count'], ascending=False)
# Drop the duplicates of the words! Show only the top ten first world!
gotstop_df = gotstop_df.drop_duplicates(['gotstopwords'], keep='first')
gotstop_df
Out[809]:
gotstopwords count
63267 of 14494
62718 the 10179
81020 i 7151
53405 to 5000
72713 a 4436
... ... ...
81248 don 2
35298 whom 1
22006 below 1
42758 didn 1
1687 hers 1

135 rows × 2 columns

In [810]:
# Print the 10 most frequent words!
gotstop_df[:10].plot.bar(x='gotstopwords', y='count', rot=90);
In [811]:
# Add the frequency of occurance of the words!
gotstop_df['frequency_stopwords'] = (gotstop_df['count']/nb_tokens)*100
gotstop_df[:10]
gotstop_df[:10].plot.hist(x='gotstopwords', y='count');
In [812]:
gotstop_dftop10=gotstop_df[:10]
In [813]:
gotstop_dftop10
Out[813]:
gotstopwords count frequency_stopwords
63267 of 14494 8.031741
62718 the 10179 5.640616
81020 i 7151 3.962673
53405 to 5000 2.770712
72713 a 4436 2.458176
108631 that 3529 1.955569
65967 in 3311 1.834766
96452 just 3122 1.730033
12785 was 3106 1.721167
75679 what 2974 1.648020
In [814]:
sns.scatterplot(data=gotstop_dftop10, x="gotstopwords", y="frequency_stopwords")
Out[814]:
<AxesSubplot:xlabel='gotstopwords', ylabel='frequency_stopwords'>
In [815]:
gotstop_dftop10  #brownstop_dftop10
Out[815]:
gotstopwords count frequency_stopwords
63267 of 14494 8.031741
62718 the 10179 5.640616
81020 i 7151 3.962673
53405 to 5000 2.770712
72713 a 4436 2.458176
108631 that 3529 1.955569
65967 in 3311 1.834766
96452 just 3122 1.730033
12785 was 3106 1.721167
75679 what 2974 1.648020
In [816]:
brownstop_dftop10
Out[816]:
stops count frequency_stopwords
273885 the 62713 34.751938
299084 of 36080 19.993461
392304 and 27915 15.468888
245352 to 25732 14.259195
272852 a 21881 12.125192
356434 in 19536 10.825728
268399 that 10237 5.672757
281570 is 10011 5.547520
345852 was 9777 5.417851
149061 for 8841 4.899174
In [817]:
x=gotstop_dftop10['gotstopwords']
In [818]:
x.tolist()
Out[818]:
['of', 'the', 'i', 'to', 'a', 'that', 'in', 'just', 'was', 'what']
In [819]:
y=brownstop_dftop10['stops']
In [820]:
y.tolist()
Out[820]:
['the', 'of', 'and', 'to', 'a', 'in', 'that', 'is', 'was', 'for']

7 Wordcloud

A really common tool to visualize texts is a wordcloud. Find a suitable library and create a meaningful wordcloud of the GoT tweets (e.g. leave out punctuation, stopwords etc.)

I had deleted the punctuation and stopwords above. I rejoined meaningful stemmed words into a single string then created data['processed']. Generated a word cloud image importing libraries WordCloud, STOPWORDS

In [821]:
data['removewtopwords'] # I had deleted the punctuation and stopwords above
Out[821]:
0                                         [watch, tweaked]
1        [gjith, kom, dasht, veq, ti, je, tu, shti, nqe...
2                 [like, house, starktony, stark, mashups]
3        [reading, reactions, last, nights, episode, hi...
4        [dont, know, im, impressed, disgusted, brillia...
                               ...                        
27330    [mins, episode, finished, game, thrones, taken...
27331     [actually, fuming, game, thrones, ha, sad, must]
27332    [entirely, possible, twitter, watched, game, t...
27333    [entirely, possible, twitter, watched, game, t...
27334    [entirely, possible, twitter, watched, game, t...
Name: removewtopwords, Length: 27335, dtype: object
In [822]:
#I rejoined meaningful stemmed words into a single string.
def rejoin_words(data):
    my_list = data['removewtopwords']
    joined_words = ( " ".join(my_list))
    return joined_words

data['processed'] = data.apply(rejoin_words, axis=1)
In [823]:
data['processed'] 
Out[823]:
0                                            watch tweaked
1        gjith kom dasht veq ti je tu shti nqef edhe shume
2                       like house starktony stark mashups
3        reading reactions last nights episode hilariou...
4        dont know im impressed disgusted brilliant non...
                               ...                        
27330    mins episode finished game thrones taken feeli...
27331             actually fuming game thrones ha sad must
27332    entirely possible twitter watched game thrones...
27333    entirely possible twitter watched game thrones...
27334    entirely possible twitter watched game thrones...
Name: processed, Length: 27335, dtype: object
In [824]:
stopwords = set(STOPWORDS)

# Generate a word cloud image
text = "".join(word for word in data.processed)
wordcloud = WordCloud(stopwords=stopwords, background_color="black", width=800, height=400).generate(text)

# Display the generated image:
# the matplotlib way:
plt.axis("off")
plt.figure( figsize=(40,20))
plt.tight_layout(pad=0)
plt.imshow(wordcloud, interpolation='bilinear')
plt.show()

8 Term-document matrix

Define a time window in which all tweets count as one document. Create the term-document matrix of the tweets for this time segmentation. Apply stemming and stopword filtering.

I defined time window dataset and draw graph that shows all tweets count according to time. Then I created the term-document matrix of the tweets and applied stemming and stopword filtering.

In [825]:
# I created time window 
data2=data[['created_at','processed']]
data2['freq']=data2.groupby(by='created_at')['created_at'].transform('count')
/home/fmx2hx/.conda/envs/conda/lib/python3.7/site-packages/ipykernel/__main__.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  app.launch_new_instance()
In [826]:
data3=data2.drop('processed',axis=1).drop_duplicates()
In [827]:
data3
Out[827]:
created_at freq
0 6/3/2013 18:45 184
77 6/3/2013 18:44 99
115 6/3/2013 18:43 113
131 6/3/2013 18:42 116
141 6/3/2013 18:41 104
... ... ...
25066 6/3/2013 22:24 494
25679 6/3/2013 22:25 485
26135 6/3/2013 22:26 506
26710 6/3/2013 22:27 415
27125 6/3/2013 22:28 195

292 rows × 2 columns

In [828]:
data3['created_at'] = pd.to_datetime(data3['created_at'])
dataset_n = data3.set_index('created_at')
dataset_n.index
Out[828]:
DatetimeIndex(['2013-06-03 18:45:00', '2013-06-03 18:44:00',
               '2013-06-03 18:43:00', '2013-06-03 18:42:00',
               '2013-06-03 18:41:00', '2013-06-03 18:40:00',
               '2013-06-03 18:39:00', '2013-06-03 18:38:00',
               '2013-06-03 18:37:00', '2013-06-03 18:36:00',
               ...
               '2013-06-03 22:19:00', '2013-06-03 22:20:00',
               '2013-06-03 22:21:00', '2013-06-03 22:22:00',
               '2013-06-03 22:23:00', '2013-06-03 22:24:00',
               '2013-06-03 22:25:00', '2013-06-03 22:26:00',
               '2013-06-03 22:27:00', '2013-06-03 22:28:00'],
              dtype='datetime64[ns]', name='created_at', length=292, freq=None)
In [829]:
title_font= {"family" : "Cambria",
             "size" : 15,
             "color" : "black",
             "weight" : "bold"}

plt.rcParams.update({'figure.figsize': (10,6), 'figure.dpi': 120})

by_time = dataset_n.groupby(dataset_n.index.time).sum()
hourly_ticks = 2 * 60 * 60 *  np.arange(12)
by_time.plot(xticks=hourly_ticks, style='--o', color='blue')
plt.title('Frequency Per Hour', fontdict=title_font)
plt.xlabel('hour')
plt.ylabel('frequency')
plt.grid(axis='x')
plt.show();

I created term document matrix for tweets

In [830]:
data['processed']
Out[830]:
0                                            watch tweaked
1        gjith kom dasht veq ti je tu shti nqef edhe shume
2                       like house starktony stark mashups
3        reading reactions last nights episode hilariou...
4        dont know im impressed disgusted brilliant non...
                               ...                        
27330    mins episode finished game thrones taken feeli...
27331             actually fuming game thrones ha sad must
27332    entirely possible twitter watched game thrones...
27333    entirely possible twitter watched game thrones...
27334    entirely possible twitter watched game thrones...
Name: processed, Length: 27335, dtype: object
In [831]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
In [832]:
c = CountVectorizer()
dtf=c.fit_transform(data['processed'].head(10).tolist()).todense() #Create the term-document matrix of the tweets 
In [833]:
strfeature=' '.join(c.get_feature_names()) #It excluded so-called stopwords.
In [834]:
matrix=pd.DataFrame(dtf, columns =c.get_feature_names())
In [835]:
matrix
Out[835]:
acabo anything asimilar bolita brilliant capitulo cc dasht de disgusted dont edhe el en episode gjith going happen happened hell hilarious holy house im impressed je know kom last laughing like mad mashups nights nonetheless nqef outcome people que reactions reading sad scarred shti shume sigo spoil spoiler stark starktony surprised tears ti tratando true tu tweaked um veq ver watch
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
3 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0
6 1 0 1 1 0 1 0 0 2 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
In [836]:
matrix.sum(axis=1)
Out[836]:
0     2
1    11
2     5
3    14
4     7
5     6
6    12
7     4
8     2
9     1
dtype: int64
In [837]:
#steming 
text = "".join(word for word in data.processed )
nltk.download('punkt')
porter = PorterStemmer()
from nltk.tokenize import sent_tokenize, word_tokenize
def stemSentence(text):
    token_words=word_tokenize(text)
    token_words
    stem_sentence=[]
  
    for word in token_words:
        stem_sentence.append(porter.stem(word))
        stem_sentence.append(" ")
    return "".join(stem_sentence)

x=stemSentence(text) # applied stemming and stopword filtering to text 
#print(x)
[nltk_data] Downloading package punkt to /home/fmx2hx/nltk_data...
[nltk_data]   Package punkt is already up-to-date!

9- Topic Detection

Apply a TF-IDF weighting scheme for the term-document matrix by hand (e.g. do not use a built-in vectorizer, but normalize by text length with a summation etc. numpy or pandas is strongly suggested). Then, choose a topic detection method such as LSI or LDA, and run it on your matrix. Try to interpret your results! Are your topics meaningful? Which topics are the most representative of your document?

At this part I tried to apply TF-IDF weighting scheme for the term-document matrix. Then chose a LDA topic detection method. Most highest score topic with using TF_IDF and LDA is

Score: 0.7666631937026978
Topic: 0.016"game" + 0.016"thrones" + 0.015"cant" + 0.013"im" + 0.009"sad" + 0.009"ending" + 0.008"telling" + 0.008"watching" + 0.008"episode" + 0.008"traumatised" + 0.008"ive" + 0.008"show" + 0.008"time" + 0.007"books" + 0.007*"read

I understand that this episode end very sadly and traumatic and many viewers dissappointed

In [838]:
matrix.sum(axis=1)
Out[838]:
0     2
1    11
2     5
3    14
4     7
5     6
6    12
7     4
8     2
9     1
dtype: int64
In [839]:
matrix/matrix.sum(axis=1)
Out[839]:
acabo anything asimilar bolita brilliant capitulo cc dasht de disgusted dont edhe el en episode gjith going happen happened hell hilarious holy house im impressed je know kom last laughing like mad mashups nights nonetheless nqef outcome people que reactions reading sad scarred shti shume sigo spoil spoiler stark starktony surprised tears ti tratando true tu tweaked um veq ver watch 0 1 2 3 4 5 6 7 8 9
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
In [840]:
matrix.div(matrix.sum(axis=1))
Out[840]:
acabo anything asimilar bolita brilliant capitulo cc dasht de disgusted dont edhe el en episode gjith going happen happened hell hilarious holy house im impressed je know kom last laughing like mad mashups nights nonetheless nqef outcome people que reactions reading sad scarred shti shume sigo spoil spoiler stark starktony surprised tears ti tratando true tu tweaked um veq ver watch 0 1 2 3 4 5 6 7 8 9
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
In [841]:
matrix.div(matrix.sum(axis=1), axis=0)
Out[841]:
acabo anything asimilar bolita brilliant capitulo cc dasht de disgusted dont edhe el en episode gjith going happen happened hell hilarious holy house im impressed je know kom last laughing like mad mashups nights nonetheless nqef outcome people que reactions reading sad scarred shti shume sigo spoil spoiler stark starktony surprised tears ti tratando true tu tweaked um veq ver watch
0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.000000 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.5 0.0 0.000000 0.000000 0.5
1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.090909 0.000000 0.000000 0.000000 0.090909 0.000000 0.000000 0.000000 0.090909 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.090909 0.000000 0.090909 0.000000 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.090909 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.090909 0.090909 0.000000 0.000000 0.000000 0.0 0.0 0.00 0.000000 0.090909 0.000000 0.000000 0.090909 0.0 0.0 0.090909 0.000000 0.0
2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.2 0.000000 0.2 0.000000 0.000000 0.000000 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.2 0.2 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.000000 0.000000 0.0
3 0.000000 0.071429 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.071429 0.000000 0.071429 0.0 0.0 0.071429 0.071429 0.071429 0.0 0.071429 0.000000 0.000000 0.000000 0.000000 0.071429 0.000000 0.0 0.071429 0.0 0.071429 0.000000 0.000000 0.00 0.071429 0.000000 0.071429 0.071429 0.000000 0.00 0.000000 0.000000 0.000000 0.071429 0.000000 0.0 0.0 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.000000 0.000000 0.0
4 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.142857 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.142857 0.142857 0.000000 0.142857 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.000000 0.142857 0.000000 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.000000 0.000000 0.0
5 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.166667 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.166667 0.0 0.000000 0.0 0.000000 0.000000 0.000000 0.00 0.000000 0.000000 0.000000 0.000000 0.166667 0.00 0.000000 0.000000 0.000000 0.000000 0.166667 0.0 0.0 0.00 0.166667 0.000000 0.000000 0.166667 0.000000 0.0 0.0 0.000000 0.000000 0.0
6 0.083333 0.000000 0.083333 0.083333 0.000000 0.083333 0.000000 0.000000 0.166667 0.000000 0.000000 0.000000 0.083333 0.083333 0.000000 0.000000 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.000000 0.00 0.000000 0.083333 0.000000 0.000000 0.000000 0.00 0.000000 0.000000 0.083333 0.000000 0.000000 0.0 0.0 0.00 0.000000 0.000000 0.083333 0.000000 0.000000 0.0 0.0 0.000000 0.083333 0.0
7 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.000000 0.250000 0.000000 0.25 0.000000 0.000000 0.000000 0.000000 0.000000 0.25 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.25 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.000000 0.000000 0.0
8 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.5 0.0 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.000000 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.5 0.000000 0.000000 0.0
9 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 1.0 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.000000 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.000000 0.000000 0.0
In [842]:
matrix.div(matrix.sum(axis=1), axis=0)
matrix2=matrix.astype(bool).sum()
In [843]:
matrix.astype(bool).sum()['de']
Out[843]:
1
In [844]:
import math
matrix.div(matrix.sum(axis=1), axis=0)
matrix2=matrix.astype(bool).mean()
matrix3=matrix2.apply(lambda x:-math.log(1/(1+x)))
matrix4=matrix.div(matrix.sum(axis=1), axis=0)
In [845]:
matrix5=matrix4.mul(matrix3,axis=1)
In [846]:
matrix5 #TF-IDF
Out[846]:
acabo anything asimilar bolita brilliant capitulo cc dasht de disgusted dont edhe el en episode gjith going happen happened hell hilarious holy house im impressed je know kom last laughing like mad mashups nights nonetheless nqef outcome people que reactions reading sad scarred shti shume sigo spoil spoiler stark starktony surprised tears ti tratando true tu tweaked um veq ver watch
0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.047655 0.000000 0.000000 0.000000 0.047655
1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.008665 0.000000 0.000000 0.000000 0.008665 0.000000 0.000000 0.000000 0.008665 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.008665 0.000000 0.008665 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.008665 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.008665 0.008665 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.008665 0.000000 0.000000 0.008665 0.000000 0.000000 0.008665 0.000000 0.000000
2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.019062 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.019062 0.000000 0.019062 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.019062 0.019062 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
3 0.000000 0.006808 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.006808 0.000000 0.006808 0.000000 0.00000 0.006808 0.006808 0.006808 0.000000 0.013023 0.000000 0.000000 0.000000 0.000000 0.006808 0.000000 0.000000 0.006808 0.000000 0.006808 0.000000 0.000000 0.000000 0.006808 0.000000 0.006808 0.006808 0.000000 0.000000 0.000000 0.000000 0.000000 0.006808 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
4 0.000000 0.000000 0.000000 0.000000 0.013616 0.000000 0.000000 0.000000 0.000000 0.013616 0.013616 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.026046 0.013616 0.000000 0.013616 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.026046 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
5 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.015885 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.015885 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.015885 0.000000 0.000000 0.000000 0.000000 0.000000 0.015885 0.000000 0.000000 0.000000 0.015885 0.000000 0.000000 0.015885 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
6 0.007943 0.000000 0.007943 0.007943 0.000000 0.007943 0.000000 0.000000 0.015885 0.000000 0.000000 0.000000 0.007943 0.007943 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.007943 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.007943 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.007943 0.000000 0.000000 0.000000 0.000000 0.000000 0.007943 0.000000
7 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.045580 0.000000 0.023828 0.000000 0.000000 0.000000 0.000000 0.000000 0.023828 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.023828 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
8 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.047655 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.047655 0.000000 0.000000 0.000000
9 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.09531 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
In [847]:
Processeddata=data['removewtopwords']
In [848]:
Processeddata
Out[848]:
0                                         [watch, tweaked]
1        [gjith, kom, dasht, veq, ti, je, tu, shti, nqe...
2                 [like, house, starktony, stark, mashups]
3        [reading, reactions, last, nights, episode, hi...
4        [dont, know, im, impressed, disgusted, brillia...
                               ...                        
27330    [mins, episode, finished, game, thrones, taken...
27331     [actually, fuming, game, thrones, ha, sad, must]
27332    [entirely, possible, twitter, watched, game, t...
27333    [entirely, possible, twitter, watched, game, t...
27334    [entirely, possible, twitter, watched, game, t...
Name: removewtopwords, Length: 27335, dtype: object
In [849]:
dictionary = corpora.Dictionary(Processeddata)
In [850]:
count = 0
for k, v in dictionary.iteritems():
    print(k, v)
    count += 1
    if count > 10:
        break
0 tweaked
1 watch
2 dasht
3 edhe
4 gjith
5 je
6 kom
7 nqef
8 shti
9 shume
10 ti
In [851]:
dictionary.filter_extremes(no_below=15, no_above=0.5, keep_n=100000)
In [852]:
bow_corpus = [dictionary.doc2bow(doc) for doc in Processeddata]
bow_corpus[4310]
Out[852]:
[(12, 1), (100, 1), (788, 1)]
In [853]:
bow_doc_4310 = bow_corpus[4310]

for i in range(len(bow_doc_4310)):
    print("Word {} (\"{}\") appears {} time.".format(bow_doc_4310[i][0], 
                                                     dictionary[bow_doc_4310[i][0]], 
                                                     bow_doc_4310[i][1]))
Word 12 ("im") appears 1 time.
Word 100 ("time") appears 1 time.
Word 788 ("telling") appears 1 time.

TF-IDF

In [854]:
from gensim import corpora, models

tfidf = models.TfidfModel(bow_corpus)
In [855]:
corpus_tfidf = tfidf[bow_corpus]
In [856]:
from pprint import pprint

for doc in corpus_tfidf:
    pprint(doc)
    break
[(0, 1.0)]
In [857]:
lda_model = models.LdaMulticore(bow_corpus, num_topics=15, id2word=dictionary, passes=5, iterations=50) # lda model 
In [858]:
for idx, topic in lda_model.print_topics(-1):
    print('Topic: {} \nWords: {}'.format(idx, topic))
Topic: 0 
Words: 0.065*"george" + 0.063*"martin" + 0.057*"rr" + 0.052*"twitter" + 0.049*"characters" + 0.049*"killed" + 0.047*"doesnt" + 0.043*"use" + 0.031*"im" + 0.025*"game"
Topic: 1 
Words: 0.089*"game" + 0.086*"thrones" + 0.045*"last" + 0.039*"nights" + 0.030*"watch" + 0.021*"episode" + 0.012*"im" + 0.011*"going" + 0.011*"need" + 0.010*"feel"
Topic: 2 
Words: 0.075*"game" + 0.073*"thrones" + 0.062*"de" + 0.026*"que" + 0.025*"el" + 0.024*"la" + 0.017*"wtf" + 0.016*"im" + 0.015*"speechless" + 0.011*"en"
Topic: 3 
Words: 0.053*"omg" + 0.051*"episode" + 0.039*"game" + 0.038*"thrones" + 0.028*"last" + 0.020*"nights" + 0.018*"ever" + 0.011*"next" + 0.011*"season" + 0.011*"im"
Topic: 4 
Words: 0.044*"game" + 0.042*"thrones" + 0.022*"stark" + 0.019*"dont" + 0.019*"episode" + 0.016*"spoil" + 0.016*"rains" + 0.016*"im" + 0.016*"many" + 0.016*"castamere"
Topic: 5 
Words: 0.131*"game" + 0.129*"thrones" + 0.027*"episode" + 0.020*"watching" + 0.019*"well" + 0.017*"tonight" + 0.012*"im" + 0.011*"didnt" + 0.009*"read" + 0.009*"oh"
Topic: 6 
Words: 0.066*"game" + 0.060*"thrones" + 0.053*"episode" + 0.014*"tonights" + 0.014*"see" + 0.012*"watched" + 0.012*"expecting" + 0.011*"really" + 0.011*"hell" + 0.010*"ever"
Topic: 7 
Words: 0.037*"coming" + 0.036*"wedding" + 0.029*"red" + 0.028*"knew" + 0.027*"still" + 0.027*"cant" + 0.019*"read" + 0.016*"books" + 0.016*"even" + 0.016*"believe"
Topic: 8 
Words: 0.128*"game" + 0.126*"thrones" + 0.032*"happened" + 0.031*"dont" + 0.023*"im" + 0.021*"know" + 0.017*"shock" + 0.012*"episode" + 0.012*"de" + 0.011*"watch"
Topic: 9 
Words: 0.096*"game" + 0.092*"thrones" + 0.077*"fuck" + 0.037*"episode" + 0.030*"holy" + 0.028*"shit" + 0.019*"happened" + 0.018*"say" + 0.017*"anyone" + 0.017*"ill"
Topic: 10 
Words: 0.094*"game" + 0.093*"thrones" + 0.039*"last" + 0.026*"night" + 0.024*"watching" + 0.023*"wedding" + 0.014*"red" + 0.012*"reactions" + 0.012*"nights" + 0.012*"still"
Topic: 11 
Words: 0.041*"game" + 0.039*"thrones" + 0.024*"cry" + 0.022*"little" + 0.021*"episode" + 0.020*"lets" + 0.018*"sit" + 0.017*"dark" + 0.015*"de" + 0.015*"quietly"
Topic: 12 
Words: 0.081*"game" + 0.080*"thrones" + 0.058*"de" + 0.019*"wow" + 0.014*"que" + 0.013*"le" + 0.013*"time" + 0.012*"la" + 0.011*"je" + 0.010*"tomorrow"
Topic: 13 
Words: 0.016*"get" + 0.015*"think" + 0.014*"game" + 0.013*"im" + 0.013*"like" + 0.012*"thrones" + 0.012*"watching" + 0.012*"got" + 0.011*"starks" + 0.011*"wedding"
Topic: 14 
Words: 0.073*"game" + 0.071*"thrones" + 0.043*"oh" + 0.035*"god" + 0.022*"watch" + 0.020*"fucking" + 0.019*"hell" + 0.011*"dont" + 0.011*"think" + 0.011*"people"
In [859]:
#Running LDA using TF-IDF
# I applied TF-IDF to LDA MODEL
In [860]:
lda_model_tfidf = models.LdaMulticore(corpus_tfidf, num_topics=15, id2word=dictionary, passes=5, iterations=50)
In [861]:
for idx, topic in lda_model_tfidf.print_topics(-1):
    print('Topic: {} Word: {}'.format(idx, topic))
Topic: 0 Word: 0.025*"de" + 0.024*"cant" + 0.023*"game" + 0.023*"thrones" + 0.021*"believe" + 0.017*"intense" + 0.015*"get" + 0.013*"la" + 0.012*"depression" + 0.011*"le"
Topic: 1 Word: 0.030*"hell" + 0.027*"shock" + 0.023*"dont" + 0.022*"im" + 0.021*"happened" + 0.020*"know" + 0.020*"game" + 0.019*"thrones" + 0.017*"de" + 0.017*"fucking"
Topic: 2 Word: 0.031*"coming" + 0.031*"see" + 0.025*"watching" + 0.021*"didnt" + 0.019*"still" + 0.018*"last" + 0.017*"thrones" + 0.017*"game" + 0.017*"episode" + 0.015*"nights"
Topic: 3 Word: 0.021*"wedding" + 0.016*"tv" + 0.015*"ever" + 0.014*"best" + 0.014*"red" + 0.014*"game" + 0.013*"thrones" + 0.013*"like" + 0.013*"episode" + 0.010*"de"
Topic: 4 Word: 0.041*"omg" + 0.031*"holy" + 0.031*"fuck" + 0.021*"episode" + 0.019*"thrones" + 0.019*"game" + 0.019*"shit" + 0.018*"say" + 0.016*"ill" + 0.016*"lannister"
Topic: 5 Word: 0.042*"martin" + 0.042*"george" + 0.038*"rr" + 0.036*"characters" + 0.036*"killed" + 0.036*"twitter" + 0.035*"use" + 0.035*"doesnt" + 0.030*"game" + 0.029*"thrones"
Topic: 6 Word: 0.028*"happened" + 0.025*"game" + 0.024*"thrones" + 0.024*"episode" + 0.012*"spoilers" + 0.011*"never" + 0.010*"feel" + 0.010*"last" + 0.010*"im" + 0.009*"watch"
Topic: 7 Word: 0.092*"fuck" + 0.028*"game" + 0.026*"thrones" + 0.020*"happened" + 0.013*"im" + 0.011*"well" + 0.007*"crying" + 0.007*"oh" + 0.007*"cry" + 0.006*"kill"
Topic: 8 Word: 0.044*"wow" + 0.018*"game" + 0.018*"thrones" + 0.012*"reaction" + 0.012*"episode" + 0.012*"cannot" + 0.011*"last" + 0.010*"believe" + 0.010*"shocking" + 0.009*"red"
Topic: 9 Word: 0.032*"holy" + 0.029*"shit" + 0.021*"game" + 0.020*"thrones" + 0.014*"hate" + 0.012*"stark" + 0.012*"bloody" + 0.011*"fucking" + 0.011*"hell" + 0.009*"oh"
Topic: 10 Word: 0.021*"thrones" + 0.021*"game" + 0.017*"watched" + 0.014*"expecting" + 0.012*"heart" + 0.011*"wasnt" + 0.010*"episode" + 0.009*"crazy" + 0.009*"im" + 0.009*"de"
Topic: 11 Word: 0.050*"oh" + 0.045*"god" + 0.044*"wtf" + 0.042*"speechless" + 0.021*"game" + 0.020*"thrones" + 0.015*"brutal" + 0.011*"olsun" + 0.011*"dizimag" + 0.011*"helal"
Topic: 12 Word: 0.018*"thrones" + 0.018*"game" + 0.013*"episode" + 0.013*"face" + 0.010*"reactions" + 0.009*"traumatised" + 0.009*"fans" + 0.009*"frey" + 0.009*"last" + 0.009*"compilation"
Topic: 13 Word: 0.089*"game" + 0.087*"thrones" + 0.023*"rains" + 0.023*"castamere" + 0.012*"episode" + 0.010*"jesus" + 0.009*"shocked" + 0.009*"christ" + 0.009*"last" + 0.009*"even"
Topic: 14 Word: 0.030*"watch" + 0.018*"game" + 0.018*"thrones" + 0.015*"rt" + 0.015*"omfg" + 0.015*"think" + 0.011*"last" + 0.010*"books" + 0.010*"wedding" + 0.010*"never"
In [862]:
Processeddata[4310]
Out[862]:
['time', 'im', 'telling']
In [863]:
# I looked score of topics
for index, score in sorted(lda_model[bow_corpus[4310]], key=lambda tup: -1*tup[1]):
    print("\nScore: {}\t \nTopic: {}".format(score, lda_model.print_topic(index, 15)))
Score: 0.7666619420051575	 
Topic: 0.065*"george" + 0.063*"martin" + 0.057*"rr" + 0.052*"twitter" + 0.049*"characters" + 0.049*"killed" + 0.047*"doesnt" + 0.043*"use" + 0.031*"im" + 0.025*"game" + 0.023*"thrones" + 0.012*"hate" + 0.011*"watching" + 0.011*"like" + 0.010*"reaction"

Score: 0.016667049378156662	 
Topic: 0.081*"game" + 0.080*"thrones" + 0.058*"de" + 0.019*"wow" + 0.014*"que" + 0.013*"le" + 0.013*"time" + 0.012*"la" + 0.011*"je" + 0.010*"tomorrow" + 0.010*"un" + 0.009*"twitter" + 0.009*"episode" + 0.009*"hope" + 0.008*"difficult"

Score: 0.01666702702641487	 
Topic: 0.053*"omg" + 0.051*"episode" + 0.039*"game" + 0.038*"thrones" + 0.028*"last" + 0.020*"nights" + 0.018*"ever" + 0.011*"next" + 0.011*"season" + 0.011*"im" + 0.010*"ive" + 0.010*"best" + 0.008*"everyone" + 0.007*"rt" + 0.007*"shocked"

Score: 0.016667010262608528	 
Topic: 0.094*"game" + 0.093*"thrones" + 0.039*"last" + 0.026*"night" + 0.024*"watching" + 0.023*"wedding" + 0.014*"red" + 0.012*"reactions" + 0.012*"nights" + 0.012*"still" + 0.011*"best" + 0.011*"im" + 0.008*"episode" + 0.008*"via" + 0.008*"one"

Score: 0.01666700839996338	 
Topic: 0.016*"get" + 0.015*"think" + 0.014*"game" + 0.013*"im" + 0.013*"like" + 0.012*"thrones" + 0.012*"watching" + 0.012*"got" + 0.011*"starks" + 0.011*"wedding" + 0.011*"love" + 0.010*"always" + 0.010*"kill" + 0.010*"people" + 0.010*"episode"

Score: 0.01666700653731823	 
Topic: 0.128*"game" + 0.126*"thrones" + 0.032*"happened" + 0.031*"dont" + 0.023*"im" + 0.021*"know" + 0.017*"shock" + 0.012*"episode" + 0.012*"de" + 0.011*"watch" + 0.011*"fuck" + 0.010*"like" + 0.010*"cant" + 0.009*"damn" + 0.007*"wtf"

Score: 0.016667000949382782	 
Topic: 0.041*"game" + 0.039*"thrones" + 0.024*"cry" + 0.022*"little" + 0.021*"episode" + 0.020*"lets" + 0.018*"sit" + 0.017*"dark" + 0.015*"de" + 0.015*"quietly" + 0.014*"havent" + 0.012*"im" + 0.010*"credits" + 0.010*"show" + 0.010*"tonight"

Score: 0.016667000949382782	 
Topic: 0.073*"game" + 0.071*"thrones" + 0.043*"oh" + 0.035*"god" + 0.022*"watch" + 0.020*"fucking" + 0.019*"hell" + 0.011*"dont" + 0.011*"think" + 0.011*"people" + 0.011*"seen" + 0.010*"ive" + 0.010*"twitter" + 0.009*"heart" + 0.008*"see"

Score: 0.016666997224092484	 
Topic: 0.075*"game" + 0.073*"thrones" + 0.062*"de" + 0.026*"que" + 0.025*"el" + 0.024*"la" + 0.017*"wtf" + 0.016*"im" + 0.015*"speechless" + 0.011*"en" + 0.011*"ok" + 0.010*"wow" + 0.010*"shock" + 0.009*"capxedtulo" + 0.008*"con"

Score: 0.016666995361447334	 
Topic: 0.044*"game" + 0.042*"thrones" + 0.022*"stark" + 0.019*"dont" + 0.019*"episode" + 0.016*"spoil" + 0.016*"rains" + 0.016*"im" + 0.016*"many" + 0.016*"castamere" + 0.015*"never" + 0.014*"man" + 0.014*"robb" + 0.012*"well" + 0.012*"anyone"

Score: 0.016666993498802185	 
Topic: 0.131*"game" + 0.129*"thrones" + 0.027*"episode" + 0.020*"watching" + 0.019*"well" + 0.017*"tonight" + 0.012*"im" + 0.011*"didnt" + 0.009*"read" + 0.009*"oh" + 0.008*"books" + 0.008*"dont" + 0.007*"new" + 0.007*"expect" + 0.006*"omfg"

Score: 0.016666993498802185	 
Topic: 0.066*"game" + 0.060*"thrones" + 0.053*"episode" + 0.014*"tonights" + 0.014*"see" + 0.012*"watched" + 0.012*"expecting" + 0.011*"really" + 0.011*"hell" + 0.010*"ever" + 0.010*"happened" + 0.009*"twitter" + 0.009*"die" + 0.009*"end" + 0.008*"last"

Score: 0.016666993498802185	 
Topic: 0.037*"coming" + 0.036*"wedding" + 0.029*"red" + 0.028*"knew" + 0.027*"still" + 0.027*"cant" + 0.019*"read" + 0.016*"books" + 0.016*"even" + 0.016*"believe" + 0.015*"happen" + 0.014*"get" + 0.014*"see" + 0.014*"never" + 0.014*"book"

Score: 0.016666991636157036	 
Topic: 0.089*"game" + 0.086*"thrones" + 0.045*"last" + 0.039*"nights" + 0.030*"watch" + 0.021*"episode" + 0.012*"im" + 0.011*"going" + 0.011*"need" + 0.010*"feel" + 0.010*"way" + 0.009*"know" + 0.008*"like" + 0.008*"watching" + 0.008*"one"

Score: 0.016666991636157036	 
Topic: 0.096*"game" + 0.092*"thrones" + 0.077*"fuck" + 0.037*"episode" + 0.030*"holy" + 0.028*"shit" + 0.019*"happened" + 0.018*"say" + 0.017*"anyone" + 0.017*"ill" + 0.016*"lannister" + 0.016*"make" + 0.016*"sorry" + 0.015*"guy" + 0.015*"spoiled"
In [864]:
# I looked score of topics Tf_IDF
for index, score in sorted(lda_model_tfidf[bow_corpus[4310]], key=lambda tup: -1*tup[1]):
    print("\nScore: {}\t \nTopic: {}".format(score, lda_model_tfidf.print_topic(index, 15)))
Score: 0.7666625380516052	 
Topic: 0.032*"holy" + 0.029*"shit" + 0.021*"game" + 0.020*"thrones" + 0.014*"hate" + 0.012*"stark" + 0.012*"bloody" + 0.011*"fucking" + 0.011*"hell" + 0.009*"oh" + 0.009*"episode" + 0.009*"time" + 0.009*"north" + 0.009*"crap" + 0.008*"awesome"

Score: 0.016666991636157036	 
Topic: 0.030*"hell" + 0.027*"shock" + 0.023*"dont" + 0.022*"im" + 0.021*"happened" + 0.020*"know" + 0.020*"game" + 0.019*"thrones" + 0.017*"de" + 0.017*"fucking" + 0.015*"cry" + 0.013*"little" + 0.012*"sit" + 0.012*"dark" + 0.012*"lets"

Score: 0.016666976734995842	 
Topic: 0.042*"martin" + 0.042*"george" + 0.038*"rr" + 0.036*"characters" + 0.036*"killed" + 0.036*"twitter" + 0.035*"use" + 0.035*"doesnt" + 0.030*"game" + 0.029*"thrones" + 0.016*"time" + 0.014*"tonight" + 0.012*"actual" + 0.010*"episode" + 0.009*"left"

Score: 0.016666973009705544	 
Topic: 0.031*"coming" + 0.031*"see" + 0.025*"watching" + 0.021*"didnt" + 0.019*"still" + 0.018*"last" + 0.017*"thrones" + 0.017*"game" + 0.017*"episode" + 0.015*"nights" + 0.015*"wow" + 0.014*"im" + 0.013*"happen" + 0.013*"knew" + 0.011*"good"

Score: 0.016666971147060394	 
Topic: 0.092*"fuck" + 0.028*"game" + 0.026*"thrones" + 0.020*"happened" + 0.013*"im" + 0.011*"well" + 0.007*"crying" + 0.007*"oh" + 0.007*"cry" + 0.006*"kill" + 0.006*"laugh" + 0.006*"lol" + 0.006*"wat" + 0.006*"de" + 0.006*"feel"

Score: 0.01666695810854435	 
Topic: 0.028*"happened" + 0.025*"game" + 0.024*"thrones" + 0.024*"episode" + 0.012*"spoilers" + 0.011*"never" + 0.010*"feel" + 0.010*"last" + 0.010*"im" + 0.009*"watch" + 0.009*"get" + 0.008*"tonights" + 0.008*"watching" + 0.008*"cant" + 0.008*"need"

Score: 0.01666695810854435	 
Topic: 0.021*"thrones" + 0.021*"game" + 0.017*"watched" + 0.014*"expecting" + 0.012*"heart" + 0.011*"wasnt" + 0.010*"episode" + 0.009*"crazy" + 0.009*"im" + 0.009*"de" + 0.009*"oh" + 0.009*"last" + 0.008*"expect" + 0.008*"fuck" + 0.008*"need"

Score: 0.01666695810854435	 
Topic: 0.018*"thrones" + 0.018*"game" + 0.013*"episode" + 0.013*"face" + 0.010*"reactions" + 0.009*"traumatised" + 0.009*"fans" + 0.009*"frey" + 0.009*"last" + 0.009*"compilation" + 0.008*"watching" + 0.008*"still" + 0.008*"end" + 0.008*"latest" + 0.007*"dont"

Score: 0.0166669562458992	 
Topic: 0.021*"wedding" + 0.016*"tv" + 0.015*"ever" + 0.014*"best" + 0.014*"red" + 0.014*"game" + 0.013*"thrones" + 0.013*"like" + 0.013*"episode" + 0.010*"de" + 0.009*"damn" + 0.009*"im" + 0.008*"last" + 0.007*"shock" + 0.007*"still"

Score: 0.01666695438325405	 
Topic: 0.041*"omg" + 0.031*"holy" + 0.031*"fuck" + 0.021*"episode" + 0.019*"thrones" + 0.019*"game" + 0.019*"shit" + 0.018*"say" + 0.016*"ill" + 0.016*"lannister" + 0.016*"sorry" + 0.016*"words" + 0.015*"anyone" + 0.015*"spoiled" + 0.015*"make"

Score: 0.01666695438325405	 
Topic: 0.044*"wow" + 0.018*"game" + 0.018*"thrones" + 0.012*"reaction" + 0.012*"episode" + 0.012*"cannot" + 0.011*"last" + 0.010*"believe" + 0.010*"shocking" + 0.009*"red" + 0.009*"happened" + 0.009*"stark" + 0.009*"wedding" + 0.008*"rob" + 0.008*"epic"

Score: 0.01666695438325405	 
Topic: 0.050*"oh" + 0.045*"god" + 0.044*"wtf" + 0.042*"speechless" + 0.021*"game" + 0.020*"thrones" + 0.015*"brutal" + 0.011*"olsun" + 0.011*"dizimag" + 0.011*"helal" + 0.010*"episode" + 0.010*"happened" + 0.009*"fucking" + 0.008*"im" + 0.008*"last"

Score: 0.016666950657963753	 
Topic: 0.089*"game" + 0.087*"thrones" + 0.023*"rains" + 0.023*"castamere" + 0.012*"episode" + 0.010*"jesus" + 0.009*"shocked" + 0.009*"christ" + 0.009*"last" + 0.009*"even" + 0.008*"watching" + 0.008*"oo" + 0.008*"cant" + 0.008*"done" + 0.008*"di"

Score: 0.016666950657963753	 
Topic: 0.030*"watch" + 0.018*"game" + 0.018*"thrones" + 0.015*"rt" + 0.015*"omfg" + 0.015*"think" + 0.011*"last" + 0.010*"books" + 0.010*"wedding" + 0.010*"never" + 0.009*"read" + 0.008*"really" + 0.008*"de" + 0.008*"nights" + 0.007*"going"

Score: 0.016666945070028305	 
Topic: 0.025*"de" + 0.024*"cant" + 0.023*"game" + 0.023*"thrones" + 0.021*"believe" + 0.017*"intense" + 0.015*"get" + 0.013*"la" + 0.012*"depression" + 0.011*"le" + 0.009*"finished" + 0.009*"episode" + 0.008*"silence" + 0.008*"credits" + 0.008*"silent"

Exercise 7

Write an own name parser for the tweets, and consider all names that you find in the dataset as a node of a graph. Add 1 to the weight of an edge if two names occur in the same tweet. With the help of networkx, draw the weighted network of names from the text. Try to find a simple clustering algorithm in networkx, cluster the names in the dataset. Print or visualize your results!

Exercise 8

This episode caused severe disappointments in many viewers, because of the sudden death of too many of the favourite characters. Search for some sentiment analysis method, and create a timeline of sentiments based on the tweet texts. Do the sentiments on Twitter reflect the time of the worst scene?

Sentiment analysis (or opinion mining) is a natural language processing technique used to determine whether data is positive, negative or neutral. There are a lot of different ways in determining whether a sentiment of a sentence is considered as positive, negative, or neutral. For this analysis, I used a package named TextBlob to score each sentence spoken by every unique character on our dataset. Scores provided by TextBlob consist of two values which are polarity and subjectivity. Polarity score is between -1 to 1 which define the attitude as positive, negative, or neutral in a statement, while subjectivity score is between 0 to 1 referring to personal opinion, emotion, or judgement. However, I made use of polarity score to support this analysis. positive sentiment : polarity ≥ +0.5 negative sentiment : polarity ≤ -0.5 neutral sentiment : -0.5 < polarity < +0.5 Then I created a timeline for negative sentiments to reflect the time of the worst scene. I saw 1267 tweets are positive, 1832 tweets are negative and 25239 tweets are neutral

In [865]:
data
Out[865]:
id created_at created_at_shift from_user from_user_id from_user_id_str from_user_name id_str in_reply_to_status_id in_reply_to_status_id_str iso_language_code latitude longitude metadata place profile_image_url profile_image_url_https query source text to_user to_user_id to_user_id_str to_user_name type hashtag textclean textcleanpunct textlower removewtopwords stopwordsall processed
0 3.41612e+17 6/3/2013 18:45 0 TheMadamEditor 337689639 337689639 madam-editor 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/1448601184/... https://si0.twimg.com/profile_images/144860118... #gameofthrones <a href="http://twitter.com/download/iphone">T... About to watch #GameOfThrones and I am tweaked. nan nan nan nan nan ['#gameofthrones'] About to watch and I am tweaked. About to watch and I am tweaked [about, to, watch, and, i, am, tweaked] [watch, tweaked] [about, to, and, i, am] watch tweaked
1 3.41612e+17 6/3/2013 18:45 0 nitaselimi 421347539 421347539 Nita Selimi 3.41612e+17 3.41611e+17 3.41611e+17 tl nan nan result_type=recent nan http://a0.twimg.com/profile_images/3570330661/... https://si0.twimg.com/profile_images/357033066... #gameofthrones <a href="http://twitter.com/download/android">... @Grangjii gjith e kom dasht, veq ti je tu ma s... Grangjii 45957016.0 45957016.0 Granit Gjevukaj nan ['#gameofthrones'] gjith e kom dasht, veq ti je tu ma shti nqef e... gjith e kom dasht veq ti je tu ma shti nqef ed... [gjith, e, kom, dasht, veq, ti, je, tu, ma, sh... [gjith, kom, dasht, veq, ti, je, tu, shti, nqe... [ma, ma] gjith kom dasht veq ti je tu shti nqef edhe shume
2 3.41612e+17 6/3/2013 18:45 0 dh_editorial 256671039 256671039 Dee @ EditorialEyes 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/1252656506/... https://si0.twimg.com/profile_images/125265650... #gameofthrones <a href="http://twitter.com/">web</a> Are there, like, House Stark/Tony Stark mashup... nan nan nan nan nan ['#gameofthrones', '#ironman'] Are there, like, House Stark/Tony Stark mashup... Are there like House StarkTony Stark mashups o... [are, there, like, house, starktony, stark, ma... [like, house, starktony, stark, mashups] [are, there, out, there, because, there, shoul... like house starktony stark mashups
3 3.41612e+17 6/3/2013 18:45 0 theprint 809334 809334 Rasmus Rasmussen 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/1469678734/... https://si0.twimg.com/profile_images/146967873... #gameofthrones <a href="http://www.tweetdeck.com">TweetDeck</a> Reading #GameOfThrones reactions after last ni... nan nan nan nan nan ['#gameofthrones'] Reading reactions after last night's episode i... Reading reactions after last nights episode is... [reading, reactions, after, last, nights, epis... [reading, reactions, last, nights, episode, hi... [after, is, not, to, but, are] reading reactions last nights episode hilariou...
4 3.41612e+17 6/3/2013 18:45 0 Mr_Twenty2 69222052 69222052 Marty Caan 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/1492961144/... https://si0.twimg.com/profile_images/149296114... #gameofthrones <a href="http://twitter.com/">web</a> I don't know if I'm impressed or disgusted! Br... nan nan nan nan nan ['#got', '#gameofthrones'] I don't know if I'm impressed or disgusted! Br... I dont know if Im impressed or disgusted Brill... [i, dont, know, if, im, impressed, or, disgust... [dont, know, im, impressed, disgusted, brillia... [i, if, or] dont know im impressed disgusted brilliant non...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
27330 3.41668e+17 6/3/2013 22:28 0 RocketQueen2x5 155242080 155242080 Funke Aleshe 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3653474295/... https://si0.twimg.com/profile_images/365347429... game+of+thrones <a href="http://twitter.com/download/android">... 30mins after the episode has finished, Game o... nan nan nan nan nan [] mins after the episode has finished, Game of T... mins after the episode has finished Game of Th... [mins, after, the, episode, has, finished, gam... [mins, episode, finished, game, thrones, taken... [after, the, has, of, has, all, my, and] mins episode finished game thrones taken feeli...
27331 3.41668e+17 6/3/2013 22:28 0 Brianpmohan 51492815 51492815 Brian Mohan 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3442035606/... https://si0.twimg.com/profile_images/344203560... game+of+thrones <a href="http://twitter.com/download/iphone">T... Actually fuming with game of thrones ha how sa... nan nan nan nan nan [] Actually fuming with game of thrones ha how sa... Actually fuming with game of thrones ha how sa... [actually, fuming, with, game, of, thrones, ha... [actually, fuming, game, thrones, ha, sad, must] [with, of, how, i, be] actually fuming game thrones ha sad must
27332 3.41668e+17 6/3/2013 22:28 0 WestJamUnited 334073484 334073484 Jamie 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3741402554/... https://si0.twimg.com/profile_images/374140255... game+of+thrones <a href="http://twitter.com/download/iphone">T... RT @WstonesOxfordSt: It's entirely possible th... nan nan nan nan nan [] : It's entirely possible that Twitter just wat... Its entirely possible that Twitter just watch... [, its, entirely, possible, that, twitter, jus... [entirely, possible, twitter, watched, game, t... [its, that, just, of, and, to, itself] entirely possible twitter watched game thrones...
27333 3.41668e+17 6/3/2013 22:28 0 PAHarper 106702255 106702255 Phil Harper 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/1267004068/... https://si0.twimg.com/profile_images/126700406... game+of+thrones <a href="http://twitter.com/download/iphone">T... RT @WstonesOxfordSt: It's entirely possible th... nan nan nan nan nan [] : It's entirely possible that Twitter just wat... Its entirely possible that Twitter just watch... [, its, entirely, possible, that, twitter, jus... [entirely, possible, twitter, watched, game, t... [its, that, just, of, and, to, itself] entirely possible twitter watched game thrones...
27334 3.41668e+17 6/3/2013 22:28 0 ShrimpWonder 365207282 365207282 boundarymembranes 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3672468820/... https://si0.twimg.com/profile_images/367246882... game+of+thrones <a href="http://twitter.com/">web</a> RT @WstonesOxfordSt: It's entirely possible th... nan nan nan nan nan [] : It's entirely possible that Twitter just wat... Its entirely possible that Twitter just watch... [, its, entirely, possible, that, twitter, jus... [entirely, possible, twitter, watched, game, t... [its, that, just, of, and, to, itself] entirely possible twitter watched game thrones...

27335 rows × 32 columns

In [866]:
!pip install textblob      #for installation

import textblob            #to import
from textblob import TextBlob
data['polarity'] = data.apply(lambda x: TextBlob(x['processed']).sentiment.polarity, axis=1)
data['subjectivity'] = data.apply(lambda x: TextBlob(x['processed']).sentiment.subjectivity, axis=1)
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: textblob in /home/fmx2hx/.local/lib/python3.7/site-packages (0.15.3)
Requirement already satisfied: nltk>=3.1 in /opt/conda/lib/python3.7/site-packages (from textblob) (3.6.1)
Requirement already satisfied: joblib in /opt/conda/lib/python3.7/site-packages (from nltk>=3.1->textblob) (1.0.1)
Requirement already satisfied: tqdm in /home/fmx2hx/.local/lib/python3.7/site-packages (from nltk>=3.1->textblob) (4.59.0)
Requirement already satisfied: regex in /opt/conda/lib/python3.7/site-packages (from nltk>=3.1->textblob) (2021.4.4)
Requirement already satisfied: click in /opt/conda/lib/python3.7/site-packages (from nltk>=3.1->textblob) (7.1.2)
In [867]:
positive_df = data.loc[data['polarity'] >= 0.5]
In [868]:
Overall_positive_sentiment = (len(positive_df)/len(data))
In [869]:
positive_df # I saw 1267 tweets are positive.
Out[869]:
id created_at created_at_shift from_user from_user_id from_user_id_str from_user_name id_str in_reply_to_status_id in_reply_to_status_id_str iso_language_code latitude longitude metadata place profile_image_url profile_image_url_https query source text to_user to_user_id to_user_id_str to_user_name type hashtag textclean textcleanpunct textlower removewtopwords stopwordsall processed polarity subjectivity
14 3.41612e+17 6/3/2013 18:45 0 EmKane89 26940059 26940059 Emily Kane 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/1290024192/... https://si0.twimg.com/profile_images/129002419... #gameofthrones <a href="http://twitter.com/">web</a> Though I'd wake up in better spirits today, bu... nan nan nan nan nan ['#gameofthrones', '#shattered'] Though I'd wake up in better spirits today, bu... Though Id wake up in better spirits today but ... [though, id, wake, up, in, better, spirits, to... [though, id, wake, better, spirits, today, fac... [up, in, but, no, has, in, my] though id wake better spirits today fact ruine... 0.5000 0.500000
22 3.41612e+17 6/3/2013 18:45 0 OhKayKatOh 16725000 16725000 Katherina Oh 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3634903258/... https://si0.twimg.com/profile_images/363490325... #gameofthrones <a href="http://getglue.com">GetGlue.com</a> FAVOURITE TIME HAS ARRIVED. Everyone got me re... nan nan nan nan nan ['#gameofthrones', '#getglue'] OURITE TIME HAS ARRIVED. Everyone got me real ... OURITE TIME HAS ARRIVED Everyone got me real e... [ourite, time, has, arrived, everyone, got, me... [ourite, time, arrived, everyone, got, real, e... [has, me, for, this] ourite time arrived everyone got real excited ... 0.5625 1.000000
66 3.41612e+17 6/3/2013 18:45 0 PatrickvdSluis 178044429 178044429 Patrick van de Sluis 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/1435756286/... https://si0.twimg.com/profile_images/143575628... #gameofthrones <a href="http://www.twitter.com">Twitter for W... If you ever think there will be a happy ending... nan nan nan nan nan ['#got', '#gameofthrones'] If you ever think there will be a happy ending... If you ever think there will be a happy ending... [if, you, ever, think, there, will, be, a, hap... [ever, think, happy, ending, havent, paying, a... [if, you, there, will, be, a, you] ever think happy ending havent paying attention 0.8000 1.000000
104 3.41612e+17 6/3/2013 18:45 0 starfishncoffee 18003520 18003520 starfishncoffee 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3739604325/... https://si0.twimg.com/profile_images/373960432... #redwedding <a href="http://twitter.com/">web</a> RT @Geniusbastard: I think what I find so fasc... nan nan nan nan nan ['#redwedding'] : I think what I find so fascinating about the... I think what I find so fascinating about the ... [, i, think, what, i, find, so, fascinating, a... [think, find, fascinating, seeing, ppl, respon... [i, what, i, so, about, the, is, to, the, that... think find fascinating seeing ppl respond real... 0.7000 0.850000
109 3.41611e+17 6/3/2013 18:44 0 PupsherLive 48903336 48903336 Journey 3.41611e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3342462341/... https://si0.twimg.com/profile_images/334246234... #redwedding <a href="http://twitter.com/">web</a> RT @Geniusbastard: I think what I find so fasc... nan nan nan nan nan ['#redwedding'] : I think what I find so fascinating about the... I think what I find so fascinating about the ... [, i, think, what, i, find, so, fascinating, a... [think, find, fascinating, seeing, ppl, respon... [i, what, i, so, about, the, is, to, the, that... think find fascinating seeing ppl respond real... 0.7000 0.850000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
27186 3.41668e+17 6/3/2013 22:28 0 HJ_Symington 250607451 250607451 Harry Symington 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3501957420/... https://si0.twimg.com/profile_images/350195742... #gameofthrones <a href="http://twitter.com/download/iphone">T... I love #gameofthrones nan nan nan nan nan ['#gameofthrones'] I love I love [i, love] [love] [i] love 0.5000 0.600000
27188 3.41668e+17 6/3/2013 22:28 0 TrishoyaG 451031582 451031582 Trishoya Grant 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/2644589547/... https://si0.twimg.com/profile_images/264458954... #gameofthrones <a href="http://www.tweetcaster.com">TweetCast... RT @ZacharyBailess: Well I'm just gonna sit in... nan nan nan nan nan ['#gameofthrones'] : Well I'm just gonna sit in the corner rockin... Well Im just gonna sit in the corner rocking ... [, well, im, just, gonna, sit, in, the, corner... [well, im, gonna, sit, corner, rocking, backwa... [just, in, the, myself, and, for, a, while, now] well im gonna sit corner rocking backwards for... 0.7000 0.600000
27200 3.41668e+17 6/3/2013 22:28 0 madebyboys 415150323 415150323 Alex Vosper 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3582090495/... https://si0.twimg.com/profile_images/358209049... #gameofthrones <a href="http://twitter.com/">web</a> RT @ChrisChilvers: You know an episode of TV h... nan nan nan nan nan ['#gameofthrones'] : You know an episode of TV has been good when... You know an episode of TV has been good when ... [, you, know, an, episode, of, tv, has, been, ... [know, episode, tv, good, credits, music] [you, an, of, has, been, when, the, has, no] know episode tv good credits music 0.7000 0.600000
27221 3.41668e+17 6/3/2013 22:27 0 KamalaJones 821706746 821706746 Kamala Jones 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/2795147230/... https://si0.twimg.com/profile_images/279514723... #gameofthrones <a href="http://twitter.com/download/iphone">T... Now we all know why #gameofthrones had so damn... nan nan nan nan nan ['#gameofthrones'] Now we all know why had so damn many character... Now we all know why had so damn many character... [now, we, all, know, why, had, so, damn, many,... [know, damn, many, charactersso, could, kill] [now, we, all, why, had, so, they, them, all, ... know damn many charactersso could kill 0.5000 0.500000
27229 3.41668e+17 6/3/2013 22:28 0 sincap2 37803799 37803799 Kevin \ucf00\ube48 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3254738970/... https://si0.twimg.com/profile_images/325473897... #redwedding <a href="http://rockmelt.com">Rockmelt</a> The world gives feedback, and by paying attent... nan nan nan nan nan ['#got', '#redwedding'] The world gives feedback, and by paying attent... The world gives feedback and by paying attenti... [the, world, gives, feedback, and, by, paying,... [world, gives, feedback, paying, attention, to... [the, and, by, i, now, to, or] world gives feedback paying attention today mu... 0.5000 0.894444

1267 rows × 34 columns

In [870]:
all_words = ' '.join([text for text in positive_df['processed']]) # I wanted to look positive words in positive dataset
from wordcloud import WordCloud
wordcloud = WordCloud(width=800, height=500, random_state=21, max_font_size=110).generate(all_words)

plt.figure(figsize=(10, 7))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()
In [871]:
Negative = data.loc[data['polarity'] <= -0.5] 
In [872]:
Negative # I saw 1832 tweets are negative.
Out[872]:
id created_at created_at_shift from_user from_user_id from_user_id_str from_user_name id_str in_reply_to_status_id in_reply_to_status_id_str iso_language_code latitude longitude metadata place profile_image_url profile_image_url_https query source text to_user to_user_id to_user_id_str to_user_name type hashtag textclean textcleanpunct textlower removewtopwords stopwordsall processed polarity subjectivity
10 3.41612e+17 6/3/2013 18:45 0 suppgab 60284029 60284029 Gab Gonzalez 3.41612e+17 nan nan tl nan nan result_type=recent nan http://a0.twimg.com/profile_images/3728220392/... https://si0.twimg.com/profile_images/372822039... #gameofthrones <a href="http://twitter.com/download/iphone">T... Wait puta sorry ang sakit ng puso ko :(((( #ga... nan nan nan nan nan ['#gameofthrones'] Wait puta sorry ang sakit ng puso ko Wait puta sorry ang sakit ng puso ko [wait, puta, sorry, ang, sakit, ng, puso, ko] [wait, puta, sorry, ang, sakit, ng, puso, ko] [] wait puta sorry ang sakit ng puso ko -0.500000 1.000000
12 3.41612e+17 6/3/2013 18:45 0 LauraSxoxo 45312927 45312927 Laura 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3660002363/... https://si0.twimg.com/profile_images/366000236... #gameofthrones <a href="http://www.echofon.com/">Echofon</a> RT @GoT_Tyrion: I had to call in sick to work ... nan nan nan nan nan ['#gameofthrones', '#therainsofcastamere'] : I had to call in sick to work today. They as... I had to call in sick to work today They aske... [, i, had, to, call, in, sick, to, work, today... [call, sick, work, today, asked, told, truth, ... [i, had, to, in, to, they, why, i, the, there,... call sick work today asked told truth death fa... -0.714286 0.857143
31 3.41612e+17 6/3/2013 18:45 0 JustinCos 285336168 285336168 Coz 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3459630270/... https://si0.twimg.com/profile_images/345963027... #gameofthrones <a href="http://twitter.com/download/android">... . @kaitlin_olson If the Starks formed an A-tea... nan nan nan nan nan ['#gameofthrones'] . If the Starks formed an A-team, would Arya b... If the Starks formed an Ateam would Arya be t... [, if, the, starks, formed, an, ateam, would, ... [starks, formed, ateam, would, arya, brains, u... [if, the, an, be, the, or, the] starks formed ateam would arya brains useless ... -0.500000 0.200000
44 3.41612e+17 6/3/2013 18:45 0 Hesiod2k11 264833204 264833204 Hesiod Theogeny 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3165004758/... https://si0.twimg.com/profile_images/316500475... #gameofthrones <a href="http://twitter.com/">web</a> Walder Frey vs The Governor? Who do you hate m... nan nan nan nan nan ['#gameofthrones', '#walkingdead', '#redwedding'] Walder Frey vs The Governor? Who do you hate m... Walder Frey vs The Governor Who do you hate mo... [walder, frey, vs, the, governor, who, do, you... [walder, frey, vs, governor, hate, tweet, resp... [the, who, do, you, more, your] walder frey vs governor hate tweet response -0.800000 0.900000
71 3.41612e+17 6/3/2013 18:45 0 Fara7_H 308092022 308092022 Glen Coco 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3572714321/... https://si0.twimg.com/profile_images/357271432... #gameofthrones <a href="http://twitter.com/download/iphone">T... WTF just happened!!!! #GameofThrones I can't ... nan nan nan nan nan ['#gameofthrones'] WTF just happened!!!! I can't believe this..wh... WTF just happened I cant believe thiswhyyyyyyyyy [wtf, just, happened, i, cant, believe, thiswh... [wtf, happened, cant, believe, thiswhyyyyyyyyy] [just, i] wtf happened cant believe thiswhyyyyyyyyy -0.500000 1.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
27181 3.41668e+17 6/3/2013 22:28 0 mikkyx 25370016 25370016 Michael Price 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3609728608/... https://si0.twimg.com/profile_images/360972860... #gameofthrones <a href="http://tapbots.com/tweetbot">Tweetbot... I have been rendered mute. Speechless. I can o... nan nan nan nan nan ['#gameofthrones'] I have been rendered mute. Speechless. I can o... I have been rendered mute Speechless I can onl... [i, have, been, rendered, mute, speechless, i,... [rendered, mute, speechless, communicate, twee... [i, have, been, i, can, only, by, just] rendered mute speechless communicate tweeting ... -0.800000 0.900000
27184 3.41668e+17 6/3/2013 22:28 0 Sersoker 184798052 184798052 Bryan Sersoker 3.41668e+17 nan nan es nan nan result_type=recent nan http://a0.twimg.com/profile_images/3131207635/... https://si0.twimg.com/profile_images/313120763... #gameofthrones <a href="http://twitter.com/">web</a> RT @Marysnow87: tio que no se me va la pena y ... nan nan nan nan nan ['#gameofthrones'] : tio que no se me va la pena y la mala leche ... tio que no se me va la pena y la mala leche c... [, tio, que, no, se, me, va, la, pena, y, la, ... [tio, que, se, va, la, pena, la, mala, leche, ... [no, me, y] tio que se va la pena la mala leche con el cap... -0.500000 1.000000
27185 3.41668e+17 6/3/2013 22:28 0 obrienbarry 101535178 101535178 Barry O'Brien 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3430200721/... https://si0.twimg.com/profile_images/343020072... #gameofthrones <a href="http://twitter.com/download/android">... An episode like that makes me wish I'd not rea... nan nan nan nan nan ['#gameofthrones'] An episode like that makes me wish I'd not rea... An episode like that makes me wish Id not read... [an, episode, like, that, makes, me, wish, id,... [episode, like, makes, wish, id, read, books, ... [an, that, me, not, the, i] episode like makes wish id read books hate spo... -0.800000 0.900000
27284 3.41668e+17 6/3/2013 22:28 0 jackrjthompson 148829450 148829450 Jack Thompson 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3747580484/... https://si0.twimg.com/profile_images/374758048... game+of+thrones <a href="http://twitter.com/">web</a> Youtube has suggested watching a Game of Thron... nan nan nan nan nan [] Youtube has suggested watching a Game of Thron... Youtube has suggested watching a Game of Thron... [youtube, has, suggested, watching, a, game, o... [youtube, suggested, watching, game, thrones, ... [has, a, of, i, but, this, will] youtube suggested watching game thrones themed... -0.550000 0.533333
27308 3.41668e+17 6/3/2013 22:28 0 BookishGirl92 190694770 190694770 Michaela 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3363126626/... https://si0.twimg.com/profile_images/336312662... game+of+thrones <a href="http://www.tumblr.com/">Tumblr</a> noahfoshaw: WHY WHY DID I START WATCHING GAME ... nan nan nan nan nan [] noahfoshaw: WHY WHY DID I START WATCHING GAME ... noahfoshaw WHY WHY DID I START WATCHING GAME O... [noahfoshaw, why, why, did, i, start, watching... [noahfoshaw, start, watching, game, thrones, w... [why, why, did, i, of, this, was, the, of, my] noahfoshaw start watching game thrones worst d... -0.700000 0.700000

1832 rows × 34 columns

In [873]:
# negative time line
In [874]:
Negative2=Negative[['created_at','processed']]
# I created time window 
Negative2['freq']=Negative2.groupby(by='created_at')['created_at'].transform('count')
/home/fmx2hx/.conda/envs/conda/lib/python3.7/site-packages/ipykernel/__main__.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  app.launch_new_instance()
In [875]:
Negative2['created_at'] = pd.to_datetime(Negative2['created_at'])
dataset_n = Negative2.set_index('created_at')
dataset_n.index
/home/fmx2hx/.conda/envs/conda/lib/python3.7/site-packages/ipykernel/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':
Out[875]:
DatetimeIndex(['2013-06-03 18:45:00', '2013-06-03 18:45:00',
               '2013-06-03 18:45:00', '2013-06-03 18:45:00',
               '2013-06-03 18:45:00', '2013-06-03 18:44:00',
               '2013-06-03 18:44:00', '2013-06-03 18:43:00',
               '2013-06-03 18:39:00', '2013-06-03 18:45:00',
               ...
               '2013-06-03 22:27:00', '2013-06-03 22:28:00',
               '2013-06-03 22:28:00', '2013-06-03 22:28:00',
               '2013-06-03 22:28:00', '2013-06-03 22:28:00',
               '2013-06-03 22:28:00', '2013-06-03 22:28:00',
               '2013-06-03 22:28:00', '2013-06-03 22:28:00'],
              dtype='datetime64[ns]', name='created_at', length=1832, freq=None)
In [876]:
title_font= {"family" : "Cambria",
             "size" : 15,
             "color" : "black",
             "weight" : "bold"}

plt.rcParams.update({'figure.figsize': (10,6), 'figure.dpi': 120})

by_time = dataset_n.groupby(dataset_n.index.time).sum()
hourly_ticks = 2 * 60 * 60 *  np.arange(12)
by_time.plot(xticks=hourly_ticks, style='--o', color='brown')
plt.title('Frequency Per Hour for Negative Tweets', fontdict=title_font)
plt.xlabel('hour')
plt.ylabel('frequency')
plt.grid(axis='x')
plt.show();
In [877]:
all_words = ' '.join([text for text in Negative['processed']]) ## I wanted to look negative words in negative dataset
wordcloud = WordCloud(width=800, height=500, random_state=21, max_font_size=110).generate(all_words)

plt.figure(figsize=(10, 7))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()
In [878]:
Neutral = data.loc[(data['polarity'] >= -0.5) & (data['polarity'] <= 0.5)]
In [879]:
Neutral  # I saw that 25239 tweets are neutral
Out[879]:
id created_at created_at_shift from_user from_user_id from_user_id_str from_user_name id_str in_reply_to_status_id in_reply_to_status_id_str iso_language_code latitude longitude metadata place profile_image_url profile_image_url_https query source text to_user to_user_id to_user_id_str to_user_name type hashtag textclean textcleanpunct textlower removewtopwords stopwordsall processed polarity subjectivity
0 3.41612e+17 6/3/2013 18:45 0 TheMadamEditor 337689639 337689639 madam-editor 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/1448601184/... https://si0.twimg.com/profile_images/144860118... #gameofthrones <a href="http://twitter.com/download/iphone">T... About to watch #GameOfThrones and I am tweaked. nan nan nan nan nan ['#gameofthrones'] About to watch and I am tweaked. About to watch and I am tweaked [about, to, watch, and, i, am, tweaked] [watch, tweaked] [about, to, and, i, am] watch tweaked 0.000000 0.000000
1 3.41612e+17 6/3/2013 18:45 0 nitaselimi 421347539 421347539 Nita Selimi 3.41612e+17 3.41611e+17 3.41611e+17 tl nan nan result_type=recent nan http://a0.twimg.com/profile_images/3570330661/... https://si0.twimg.com/profile_images/357033066... #gameofthrones <a href="http://twitter.com/download/android">... @Grangjii gjith e kom dasht, veq ti je tu ma s... Grangjii 45957016.0 45957016.0 Granit Gjevukaj nan ['#gameofthrones'] gjith e kom dasht, veq ti je tu ma shti nqef e... gjith e kom dasht veq ti je tu ma shti nqef ed... [gjith, e, kom, dasht, veq, ti, je, tu, ma, sh... [gjith, kom, dasht, veq, ti, je, tu, shti, nqe... [ma, ma] gjith kom dasht veq ti je tu shti nqef edhe shume 0.000000 0.000000
2 3.41612e+17 6/3/2013 18:45 0 dh_editorial 256671039 256671039 Dee @ EditorialEyes 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/1252656506/... https://si0.twimg.com/profile_images/125265650... #gameofthrones <a href="http://twitter.com/">web</a> Are there, like, House Stark/Tony Stark mashup... nan nan nan nan nan ['#gameofthrones', '#ironman'] Are there, like, House Stark/Tony Stark mashup... Are there like House StarkTony Stark mashups o... [are, there, like, house, starktony, stark, ma... [like, house, starktony, stark, mashups] [are, there, out, there, because, there, shoul... like house starktony stark mashups -0.200000 0.600000
3 3.41612e+17 6/3/2013 18:45 0 theprint 809334 809334 Rasmus Rasmussen 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/1469678734/... https://si0.twimg.com/profile_images/146967873... #gameofthrones <a href="http://www.tweetdeck.com">TweetDeck</a> Reading #GameOfThrones reactions after last ni... nan nan nan nan nan ['#gameofthrones'] Reading reactions after last night's episode i... Reading reactions after last nights episode is... [reading, reactions, after, last, nights, epis... [reading, reactions, last, nights, episode, hi... [after, is, not, to, but, are] reading reactions last nights episode hilariou... -0.041667 0.688889
4 3.41612e+17 6/3/2013 18:45 0 Mr_Twenty2 69222052 69222052 Marty Caan 3.41612e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/1492961144/... https://si0.twimg.com/profile_images/149296114... #gameofthrones <a href="http://twitter.com/">web</a> I don't know if I'm impressed or disgusted! Br... nan nan nan nan nan ['#got', '#gameofthrones'] I don't know if I'm impressed or disgusted! Br... I dont know if Im impressed or disgusted Brill... [i, dont, know, if, im, impressed, or, disgust... [dont, know, im, impressed, disgusted, brillia... [i, if, or] dont know im impressed disgusted brilliant non... 0.300000 1.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
27330 3.41668e+17 6/3/2013 22:28 0 RocketQueen2x5 155242080 155242080 Funke Aleshe 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3653474295/... https://si0.twimg.com/profile_images/365347429... game+of+thrones <a href="http://twitter.com/download/android">... 30mins after the episode has finished, Game o... nan nan nan nan nan [] mins after the episode has finished, Game of T... mins after the episode has finished Game of Th... [mins, after, the, episode, has, finished, gam... [mins, episode, finished, game, thrones, taken... [after, the, has, of, has, all, my, and] mins episode finished game thrones taken feeli... -0.400000 0.400000
27331 3.41668e+17 6/3/2013 22:28 0 Brianpmohan 51492815 51492815 Brian Mohan 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3442035606/... https://si0.twimg.com/profile_images/344203560... game+of+thrones <a href="http://twitter.com/download/iphone">T... Actually fuming with game of thrones ha how sa... nan nan nan nan nan [] Actually fuming with game of thrones ha how sa... Actually fuming with game of thrones ha how sa... [actually, fuming, with, game, of, thrones, ha... [actually, fuming, game, thrones, ha, sad, must] [with, of, how, i, be] actually fuming game thrones ha sad must -0.300000 0.500000
27332 3.41668e+17 6/3/2013 22:28 0 WestJamUnited 334073484 334073484 Jamie 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3741402554/... https://si0.twimg.com/profile_images/374140255... game+of+thrones <a href="http://twitter.com/download/iphone">T... RT @WstonesOxfordSt: It's entirely possible th... nan nan nan nan nan [] : It's entirely possible that Twitter just wat... Its entirely possible that Twitter just watch... [, its, entirely, possible, that, twitter, jus... [entirely, possible, twitter, watched, game, t... [its, that, just, of, and, to, itself] entirely possible twitter watched game thrones... -0.200000 0.700000
27333 3.41668e+17 6/3/2013 22:28 0 PAHarper 106702255 106702255 Phil Harper 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/1267004068/... https://si0.twimg.com/profile_images/126700406... game+of+thrones <a href="http://twitter.com/download/iphone">T... RT @WstonesOxfordSt: It's entirely possible th... nan nan nan nan nan [] : It's entirely possible that Twitter just wat... Its entirely possible that Twitter just watch... [, its, entirely, possible, that, twitter, jus... [entirely, possible, twitter, watched, game, t... [its, that, just, of, and, to, itself] entirely possible twitter watched game thrones... -0.200000 0.700000
27334 3.41668e+17 6/3/2013 22:28 0 ShrimpWonder 365207282 365207282 boundarymembranes 3.41668e+17 nan nan en nan nan result_type=recent nan http://a0.twimg.com/profile_images/3672468820/... https://si0.twimg.com/profile_images/367246882... game+of+thrones <a href="http://twitter.com/">web</a> RT @WstonesOxfordSt: It's entirely possible th... nan nan nan nan nan [] : It's entirely possible that Twitter just wat... Its entirely possible that Twitter just watch... [, its, entirely, possible, that, twitter, jus... [entirely, possible, twitter, watched, game, t... [its, that, just, of, and, to, itself] entirely possible twitter watched game thrones... -0.200000 0.700000

25239 rows × 34 columns

In [880]:
all_words = ' '.join([text for text in Neutral['processed']]) # I want to show words in neutral dataset tweets.
wordcloud = WordCloud(width=800, height=500, random_state=21, max_font_size=110).generate(all_words)

plt.figure(figsize=(10, 7))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()